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Abstract 

The traditional class of elliptical distributions is extended to allow for asym- 
metries. A completely robust dispersion matrix estimator (the 'spectral estima- 
tor') for the new class of 'generalized elliptical distributions' is presented. It 
is shown that the spectral estimator corresponds to an M-estimator proposed by 
Tyler (1983) in the context of elliptical distributions. Both the generalization of 
elliptical distributions and the development of a robust dispersion matrix esti- 
mator are motivated by the stylized facts of empirical finance. Random matrix 
theory is used for analyzing the linear dependence structure of high-dimensional 
data. It is shown that the Marcenko-Pastur law fails if the sample covariance ma- 
trix is considered as a random matrix in the context of elliptically distributed and 
heavy tailed data. But substituting the sample covariance matrix by the spectral 
estimator resolves the problem and the Marcenko-Pastur law remains valid. 

1 Motivation 



Short-term financial data usually exhibit similar properties called 'stylized facts' like, 
e.g., leptokurtosis, dependence of simultaneous extremes, radial asymmetry, volatility 
clustering, etc., especially if the log-price changes (called the 'log-returns') of stocks, 
stock indices, and foreign exchange rates are considered. Particularly, high-frequency 
data usually are non- stationary, have jumps, and are strongly dependent. Cf., e.g., 
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Bouchaud, Cont, and Potters, 1998, Breymann, Dias, and Embrechts, 2003, Eberlein 
and Keller, 1995, Embrechts, Frey, and McNeil, 2004 (Section 4.1.1), Engle, 1982, 
Fama, 1965, Junker and May, 2002, Mandelbrot, 1963, and Mikosch, 2003 (Chapter 
1). 

Figure 1 contains QQ-plots of GARCH(1, 1) residuals of daily log-returns of the NAS- 
DAQ and the S&P 500 indices from 1993-01-01 to 2000-06-30. It is clearly indicated 
that the normal distribution hypothesis is not appropriate for the loss parts of the dis- 
tributions whereas the Gaussian law seems to be acceptable for the profit parts. Hence 
the probability of extreme losses is higher than suggested by the normal distribution 
assumption. 
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Fig. 1: QQ-plots of NASDAQ (left hand) and S&P 500 (right hand) GARCH(1, 1) 
residuals from 1993-01-01 to 2000-06-30 (n = 1892). 

The next picture shows the joint distribution of the GARCH residuals considered 
above. 
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Fig. 2: NASDAQ vs. S&P 500 GARCH(1, 1) residuals from 1993-01-01 to 2000-06- 

30 (n = 1892). 
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Except for one element all extremes occur simultaneously. The effect of simultaneous 
extremes can be observed more precisely in the following picture. It shows the total 
numbers of S&P 500 stocks whose absolute values of daily log-returns exceeded 10% 
for each trading day during 1980-01-02 to 2003-1 1-26. On the 19th October 1987 (i.e. 
the 'Black Monday') there occurred 239 extremes. This is suppressed for the sake of 
transparency. 
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Fig. 3: Number of extremes in the S&P 500 during 1980-01-02 to 2003-1 1-26. 

The latter figure shows the concomitance of extremes. If extremes would occur inde- 
pendently then the number of extremal events (no matter if losses or profits) should be 
small and all but constant over time. Obviously, this is not the case. In contrast one can 
see the October Crash of 1987 and several extremes which occur permanently since 
the beginning of the bear market in 2000. Hence there is an increasing tendency of 
simultaneous losses which is probably due to globalization effects and relaxed market 
regulation. The phenomenon of simultaneous extremes is often denoted by 'asymp- 
totic dependence' or 'tail dependence'. 

The traditional class of elliptically symmetric distributions (Cambanis, Huang, and 
Simons, 1981, Fang, Kotz, and Ng, 1990, and Kelker, 1970) is often proposed for 
the modeling of financial data (cf., e.g., Bingham and Kiesel, 2002). But elliptical 
distributions suffer from the property of radial symmetry. The pictures above show 
that financial data are not always symmetrically distributed. For this reason the authors 
will bear on the assumption of generalized elliptically distributed (Frahm, 2004) log- 
returns. This allows for the modeling of tail dependence and radial asymmetry. 

The quintessence of modern portfolio theory is that the portfolio diversification effect 
depends essentially on the co variances. But the parameters for portfolio optimization, 
i.e. the mean vector and the covariance matrix, have to be estimated. Especially for 
portfolio risk minimization a reliable estimate of the covariance matrix is necessary 
(Chopra and Ziemba, 1993). For covariance matrix estimation generally one should 
use as much available data as possible. But since daily log-returns and all the more 
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high-frequency data are not normally distributed, standard estimators like the sample 
covariance matrix may be highly inefficient leading to erroneous implications (see, 
e.g., Oja, 2003 and Visuri, 2001). This is because the sample covariance matrix is 
very sensitive to outliers. The smaller the distribution's tail index (Hult and Lindskog, 
2002), i.e. the heavier the tails of the log-return distributions the higher the estimator's 
variance. So the quality of the parameter estimates depends essentially on the true 
multivariate distribution of log-returns. 

In the following it is shown how the linear dependence structure of generalized ellipti- 
cal random vectors can be estimated robustly. More precisely, it is shown that Tyler's 
(1987) robust M-estimator for the dispersion matrix £ of elliptically distributed ran- 
dom vectors remains completely robust for generalized elliptically distributed random 
vectors. This estimator is not disturbed neither by asymmetries nor by outliers and 
all the available data points can be used for estimation purposes. Further, the impact 
of high-dimensional (financial) data on statistical inference will be discussed. This is 
done by referring to a branch of statistical physics called 'Random Matrix Theory' 
(Hiai and Petz, 2000 and Mehta, 1990). Random matrix theory (RMT) is concerned 
with the distribution of eigenvalues of high-dimensional randomly generated matrices. 
If each component of a sample is independent and identically distributed then the dis- 
tribution of the eigenvalues of the sample covariance matrix converges to a specified 
law which does not depend on the specific distribution of the sample components. The 
circumstances under which this result of RMT can be properly adopted to generalized 
elliptically distributed data will be examined. 

2 Generalized Elliptical Distributions 

It is well known that an elliptically distributed random vector X can be represented 
stochastically by X = d /i + TZAU {k \ where /i e R d , A e R dxk with r(A) = k, U {k) is a 
A;-dimensional random vector uniformly distributed on the unit hypersphere S k ~ 1 , and 
1Z is a nonnegative random variable stochastically independent of U^ k \ The positive 
semi-definite matrix E := AA T characterizes the linear dependence structure of X and 
is referred to as the 'dispersion matrix' . 

Definition 1 (Generalized elliptical distribution) The d- dimensional random vector 
X is said to be 'generalized elliptically distributed' if and only if 

X = fi + TZAU (k) . 

where U^ k > is a k-dimensional random vector uniformly distributed on S k ~ 1 , 1Z is a 
random variable, fi e R d , and A e R dxk . 
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Note that the definition of generalized elliptical distributions preserves all the ordinary 
components of elliptically symmetric distributions (i.e. /i, S, and 1Z). But in contrast 
the generating variate 1Z may be negative and even more it may depend on U^ k \ It is 
worth to point out that the class of generalized elliptical distributions contains the class 
of skew-elliptical distributions (Branco and Dey, 2001, and Frahm, 2004, Section 3.2). 

The next figure shows once again the joint distribution of the GARCH residuals of 
the NASDAQ and S&P 500 log-returns from 1993-01-01 to 2000-06-30 from Figure 
2. The right hand of Figure 4 contains simulated GARCH residuals on the basis of 
a generalized t-distribution. More precisely, the generating variate 1Z corresponds 
to V v " xl/xl but the number of degrees of freedom v depends on U^ 2 \ i.e. v = 
4 + 996 ■ ((5(Am/||Am|| 2 , v)) 3 (\\u\\ 2 = 1). Here 5 is a function that measures the 
distance between Am/||Am|| 2 and the reference vector v = (— cos (7r/4) , — sin (7r/4)), 
5(u,v) := Z(u,v)/tt = arccos(w T f )/ir. Hence, random vectors which are close to 
the reference vector (i.e. close to the 'perfect loss scenario') are supposed to be t- 
distributed with v = 4 degrees of freedom whereas random vectors which are opposite 
are assumed to be nearly Gaussian {y = 1000) distributed. This is consistent with the 
phenomenon observed in Figure 1. The pseudo-correlation coefficient is set to 0.78. 
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Fig. 4: Observed GARCH(1, 1) residuals of NASDAQ and S&P 500 (left hand) and 
simulated generalized t-distributed random noise (n = 1892) (right hand). 



3 Robust Covariance Matrix Estimation 

It is well-known that the sample covariance matrix corresponds both to the moment 
estimator and to the ML-estimator for the dispersion matrix S of normally distributed 
data. But given any other elliptical distribution family the dispersion matrix usually 
does not correspond to the covariance matrix. Generally, robust covariance matrix 
estimation means to estimate the dispersion matrix, that is the covariance matrix up 
to a scaling constant. There are many applications like, e.g., principal components 
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analysis, canonical correlation analysis, linear discriminant analysis, and multivariate 
regression where only the dispersion matrix is demanded (Oja, 2003). Particularly, 
by Tobin's two-fund separation theorem (Tobin, 1958) the optimal portfolio of risky 
assets does not depend on the scale of the covariance matrix. Thus in the following 
we will loosely speak of 'covariance matrix estimation' rather than of estimating the 
dispersion matrix for the sake of simplicity. 

As mentioned before the true linear dependence structure of elliptically distributed data 
can not be estimated efficiently by the sample covariance matrix, generally. Especially, 
if the data stem from a regularly varying random vector the smaller the tail index, i.e. 
the heavier the tails the larger the estimator's variance. But in the following it is shown 
that there exists a completely robust alternative to the sample covariance matrix. 

Let X be a rf-dimensional generalized elliptically distributed random vector where \i 
is supposed to be known, A e R dxk with r(A) = d, and P(TZ = 0) = 0. Further, let 
the unit random vector generated by A be defined as 



'~ ||AC/W|| 2 ' 

Due to the stochastic representation of X the following relations hold, 



||X-/4 \\KAU(% \\AU(% 

where ± := sgn(7£). The random vector ±S does not depend on the absolute value 
of 1Z. So it is completely robust against extreme outcomes of the generating variate. 
But the sign of 1Z still remains and this may depend on U^ k \ anymore. Suppose for 
the moment that ± is known for each realization of 1Z. Then the dispersion matrix 
of X can be estimated robustly via maximum-likelihood estimation using the density 
function of S which is only a function of A. This is given by the next theorem. 

Theorem 1 The spectral density function of the unit random vector generated by A e 
R dxk corresponds to 

Y (d\ d 

where S := AA T . 



Proof. See, e.g., Frahm, 2004, pp. 59-60. ■ 

Since ip is a symmetric density function the sign of 1Z does not matter at all. Hence 
the ML-estimation approach works even if the data are skew-elliptically distributed, 
for instance. 
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The desired 'spectral estimator' is given by the fixed-point equation (Frahm, 2004, 
Section 4.2.2) 



j 71 T 

j=l s j S J 

where Sj := (xj — /x) / (jXj — /i|| 2 ) for j = 1, n. Since the solution of the fixed- 
point equation is only unique up to a scaling constant in the following it is implicitly 
required that the upper left element of Eg corresponds to 1. 

The spectral estimator Ss corresponds to Tyler's robust M-estimator (Tyler, 1983 and 
Tyler, 1987) for elliptical distributions, i.e. 

- _ d <A ( Xj - fj) (xj - fif 



E 



3=1 ^ 



M) T S S 1 (Xj ~fi) 



Hence Tyler's M-estimator remains completely robust within the class of generalized 
elliptical distributions. 

The following figure shows the sample covariance matrix (left hand) of a sample 
with n = 1000 observations and d = 500 dimensions drawn from a multivariate t- 
distribution with v — 4 degrees of freedom. Note that the tail index of the multivariate 
t-distribution corresponds to v. Each cell of the plots represents a matrix element 
where the blue colored cells symbolize small numbers and the red colored cells in- 
dicate large numbers. The true dispersion matrix is given in the middle whereas the 
spectral estimate is given by the right hand. 




Fig. 5: Sample covariance matrix (left hand), true covariance matrix (middle), and 
spectral estimate (right hand) of multivariate ^distributed realizations (n = 1000, d = 
500, v = 4). 
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4 Random Matrix Theory 



RMT is concerned with the distribution of the eigenvalues of high-dimensional ran- 
domly generated matrices. A random matrix is simply a matrix of random variables. 
We will consider only symmetric random matrices. Thus the corresponding eigenval- 
ues are always real. The empirical distribution function of eigenvalues is defined as 
follows. 



Definition 2 (Empirical distribution function of eigenvalues) Let Y,be a dxd sym- 
metric random matrix with eigenvalues Ai, A 2 , . . . , A^. Then the function 

i=i 

is called the 'empirical distribution function of the eigenvalues' of E. 



Note that each eigenvalue of a random matrix in fact is random but per se not a random 
variable since there is no single- valued mapping E i— > \ (i e {1, . . . , d}) but rather 
E i— > A(E) where A(E) denotes the set of all eigenvalues of E. This can be simply 
fixed by assuming that the eigenvalues Ai, A 2 , . . . , A^ are sorted either in an increasing 
or decreasing order. 

Theorem 2 (Marcenko and Pastur, 1967) Let u[ d \ C/ 2 (d) , . . . , ui d) (n = 1, 2, . . .) be 

sequences of independent random vectors uniformly distributed on the unit hyper- 
sphere S d ~ l and consider the random matrix 

7 n 

d ^- T (d) TT (d)T 



n 



where its empirical distribution function of the eigenvalues is denoted by Wd . Suppose 
that n — > oo, d — > oo, n/d — > q < oo. Then 

W d ^F MP (.;q), 

at all points where F MP is continuous. More precisely, A i— > F M p (A ; q) = F^p (A ; g)+ 
Fy| b (A ; g) where the Dirac part is given by 



\^F^(\;q) 



1 -q, A > 0, < q < 1, 
0, e/^e, 
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and the Lebesgue part A i— > i^p* ( A j <?) = J^oo /mp ( x j 0) dx is determined by the 
density function 

A^/MP b (A;g) = 

where 



q \J (-^max— A)(A— A m j n ) . . . 

2^ ' i 5 A m ; n <- A <. A max , 

0, e/se, 



Amin,max • I 1 i 



1 ^ 



Proof. Marcenko and Pastur, 1967. ■ 

In the following S M p will be called 'Marcenko-Pastur operator'. The next corollary 
states that the Marcenko-Pastur law F M p holds not only for the empirical distribution 
function of eigenvalues of the Marcenko-Pastur operator but also for that obtained by 
the sample covariance matrix if the data are standard normally distributed and inde- 
pendent. 



Corollary 3 Let X, Xi, X 2 , . . . , X n (n — 1, 2, . . .) be sequences of independent and 
standard normally distributed random vectors with uncorrelated components. Then 
the empirical distribution function of the eigenvalues of 

n 
3=1 

converges in probability to the Marcenko-Pastur law stated in Theorem\2\ 



Proof. Due to the strong law of large numbers Xd/d ^» 1 (d — > oo) and thus 

j n -.2 1 n 

s MP ~ - ■ V ¥ ■ u f u f )r = - ■ E^7- 

n ' a J J n J 

3=1 3=1 



Moreover, the Marcenko-Pastur law holds even if X is an arbitrary random vector 
with standardized i.i.d. components provided the second moment is finite (Yin, 1986). 
More precisely, consider the random vector X with E(X) = p, and Var(X) = a 2 Id 
where the components of X are supposed to be stochastically independent. Then the 
Marcenko-Pastur law can be applied on the empirical distribution function of the eigen- 
values of 

!-Ep^)p^) T =£/*. 
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where E denotes the sample covariance matrix and 

tr(£) 1 n \ 

a .= _ = -. ^A«=.A. 

i=i 

Hence, the Marcenko-Pastur law can be applied virtually ever on the empirical dis- 
tribution function of Ai/A, A^/A where the estimated eigenvalues are given by the 
sample covariance matrix provided the sample elements, i.e. the realized random vec- 
tors consist of stochastically independent components. But within the class of elliptical 
distributions this holds only for uncorrelated normally distributed data. Hence linear 
independence and stochastical independence are not equivalent for generalized ellip- 
tically distributed data. This is because even if there is no linear dependence between 
the components of an elliptically distributed random vector another sort of nonlinear 
dependence caused by the generating variate 1Z remains, generally. 

For instance, consider the unit random vector = (U\, U 2 ). Then 

U 2 = ±y/l - Ul 

i.e. U 2 depends strongly on U\ though indeed the elements of are uncorrelated. 

Tail dependent random variables cannot be stochastically independent. Especially, if 
the random components of an elliptically distributed random vector are heavy tailed, 
i.e. if the generating variate is regularly varying then they possess the property of tail 
dependence (Schmidt, 2002). In that case the eigenspectrum generated by the sample 
covariance matrix may lead to erroneous implications. 

For instance, consider a sample (with sample size n = 1000) of 500-dimensional ran- 
dom vectors where each vector element is standardized t-distributed with v = 5 de- 
grees of freedom and stochastically independent of each other. Here the eigenspectrum 
obtained by the sample covariance matrix indeed is consistent with the Marcenko- 
Pastur law (upper left part of Figure 6). But if the data stem from a multivariate 
t-distribution possessing the same parameters and each vector component is uncorre- 
lated then the eigenspectrum obtained by the sample covariance matrix does not corre- 
spond to the Marcenko-Pastur law (upper right part of Figure 6). Actually, there are 24 
eigenvalues exceeding the Marcenko-Pastur upper bound A max = (1 + 1/ y/2 ) 2 = 2.91 
and the largest eigenvalue corresponds to 10.33. But fortunately the eigenspectra ob- 
tained by the spectral estimator are consistent with the Marcenko-Pastur law as indi- 
cated by the lower part of Figure 6. 
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Fig. 6: Eigenspectra of univariate (left part) and multivariate (right part) uncorrelated 
t-distributed data (n = 1000, d = 500, v — 5) obtained by the sample covariance 
matrix (upper part) and by the spectral estimator (lower part). 



Tyler (1987) shows that the spectral estimator converges strongly to the true dispersion 
matrix E. That means 



T 

S j S j 



S J S J 



n 



oo, d const., 



for j = 1,2,... and P-almost all realizations. Consequently, if E 
constant) then 



Id (up to a scaling 



T 



T 



(d) (<f)T 



"3 - °J 

as n — > oo and d constant. Hence the spectral estimator and the Marcenko-Pastur 
operator are asymptotically equivalent provided E = a 2 I d . The authors believe that 
the strong convergence holds even for n—>oo,d^oo,n/d^q> 1 for P-almost 
all realizations where the spectral estimate exists. The proof of this conjecture is due 
to a forthcoming work. Note that for q < 1 the spectral estimate does not exist at all. 
Further, Tyler (1987) shows that the spectral estimate exists (a.s.) if n > d(d — 1), 
i.e. q > d — 1. Indeed, this is a sufficient condition for the existency of the spectral 
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estimator. But in practice the spectral estimator seems to exist in most cases when n is 
already slightly larger than d. 

We conclude that testing high-dimensional data for the null hypothesis E = a 2 I d by 
means of the sample covariance matrix may lead to wrong conclusions provided the 
data are generalized elliptically distributed. In contrast, the spectral estimator seems 
to be a robust alternative for applying the results of RMT in the context of generalized 
elliptical distributions. 



5 Financial Applications 

5.1 Portfolio Risk Minimization 

In this section it is supposed that n/d — > oo, i.e. from the viewpoint of RMT we study 
low-dimensional problems. Let R = (R 1: R 2 , Rd) be an elliptically distributed 
random vector of short-term (e.g. daily) log-returns. If the fourth order cross moments 
of the log-returns are finite then the elements of the sample covariance matrix are 
multivariate normally distributed, asymptotically. The asymptotic covariance of each 
element is given by (see, e.g., Praag and Wesselman, 1989) 

ACOV & k i) — (1 + k) ■ ((TikCTjl + (?il<Jjk) + « • &ijVkh 

where E = [a^] denotes the true covariance matrix of R and 

• 3 E\Rj) 

is called the 'kurtosis parameter' . Note that the kurtosis parameter does not depend 
on i £ {1, d}. It is well-known that in the case of normality n — 0. A distribution 
with positive (or even infinite) k is called 'leptokurtic'. Particularly, regularly varying 
distributions are leptokurtic. 

It is well-known that the portfolio which minimizes the portfolio return variance (the so 
called 'global minimum variance portfolio') is given by the vector of portfolio weights 

w :- 



Now, suppose for the sake of simplicity that R is spherically distributed, i.e. that /x = 
and E is proportional to the identity matrix. Since the weights of the global minimum 
variance portfolio do not depend on the scale of E we may assume E = Id w.l.o.g. 
Then the asymptotic covariances of the sample covariance matrix elements are simply 
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given by 



ACov (aij,£jw) 



2 + 3k, % — j — k — I, 

% — — jj k — — lj % k^ 
1 + k, i = k, j = I, i ^ j, 
0, else. 



For instance suppose that the random vector R is multivariate t-distributed with v > 4 
degrees of freedom. Then the kurtosis parameter corresponds to k — 2/(u — 4) (see, 
e.g., Frahm, 2004, p. 91). Hence, the smaller v the larger the asymptotic variances and 
covariances and these quantities tend to infinity for v \ 4. Further, if v < 4 the sample 
covariance matrix even is no longer multivariate normally distributed, asymptotically. 

In contrast, the asymptotic covariance of each element of the spectral estimator (Frahm, 
2004, p. 76) is given by 



ACov (a s , ij, a s , ki) 



d+2 
d ' 

d+2 
d ' 

d+2 
d ' 



o, 



i = j = k = l, 

i = j,k = l,i^k, 

i = k, j = l,i=£ j, 

else. 



Note that the same holds even if R is not t-distributed but only generalized elliptically 
distributed since S s does not depend on the generating variate of R. Particularly, the 
spectral estimator is not disturbed by the tail index of R. 

Now one may ask when the sample covariance matrix is dominated (in a component- 
wise manner) by the spectral estimator provided the data are multivariate t-distributed. 
Regarding the main diagonal entries of the covariance matrix estimate this is given by 

A d+2 v-\ 
4 • — — < 2 • 



d z/-4' 

i.e. if v < 4 + 3d/ (d + 4) the variance of the spectral estimator's main diagonal 
elements is smaller than the variance of the corresponding main diagonal elements of 
the sample covariance matrix, asymptotically. Concerning its off diagonal entries we 
obtain 

d+2 v-2 

i.e. v < 4 + d. It is worth to note that several empirical studies indicate that the tail 
indices of daily log-returns generally lie between 4 and 7 (see, e.g., Embrechts, Frey, 
and McNeil, 2004, p. 81 and Junker and May, 2002). 

In the following the daily log-returns from 1980-01-02 to 2003-10-06 of 285 S&P 500 
stocks are analyzed for studying the robustness of the spectral estimator vs. the sam- 
ple covariance matrix. The considered stocks belong to the 'survivors' of the S&P 



13 



500 composite at the last quarter of 2003. The sample size corresponds to n = 6000. 
The total sample period is partitioned into 10 sub-periods each containing 600 daily 
log-returns. Further, each sub-period is divided into 'even' and 'odd' days, i.e. there 
is a sub-sample containing the 1st, 3rd, . . . , 599th log-returns and another sub-sample 
with the 2nd, 4th, . . . , 600th log-returns. Hence each sub-sample contains 300 daily 
log-returns of 285 stocks. Both the sample covariance matrix and the spectral estima- 
tor are used for estimating the relative eigenspectrum of the true covariance matrix, 
i.e. Ai/ Y^h=i • • • > -W Yli=i f° r eacn even an d °dd sub-sample, separately. If the 
covariance matrix estimator is robust against outliers then the estimated eigenspectra 
of each sub- sample should be similar since even if the true eigenspectrum changes 
dynamically over time this must affect both the even and the odd days, equally. The 
eigenspectrum obtained in the even sub-sample can be compared with the eigenspec- 
trum given by the odd sub-sample simply by the differences of the ordered (relative) 
eigenvalues. 
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Fig. 7: Eigenvalue differences for each ordered eigenvalue given by the sample co- 
variance matrix (left hand) and by the spectral estimate (right hand). 



On the left hand of Figure 7 we see the eigenvalue differences for each 10 sub-periods 
caused by the sample covariance matrix. Similarly, the right hand of Figure 7 shows 
the eigenvalue differences given by the spectral estimate. Figure 7 indicates that the 
spectral estimator leads to more robust estimates of the eigenspectra of financial data. 
But note that - concerning the overall eigenspectrum - the sample covariance matrix 
performs well up to the 4th sub-period. This is the period which contains the famous 
October Crash of 1987. In contrast, the spectral estimator is not affected by extreme 
values. 
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Fig. 8: Eigenvalue differences for the largest 5 eigenvalues given by the sample co- 
variance matrix (left hand) and by the spectral estimate (right hand). 



Figure 8 focuses on the differences of the 5 largest eigenvalues. It shows that the 
sample covariance matrix particularly fails for estimating the largest eigenvalue. Once 
again this phenomenon is caused by the Black Monday which belongs to the even 
sub-sample of the 4th sub-period. Note that the largest eigenvalue of the even sub- 
sample exceeds the largest eigenvalue of the odd sub-sample by almost 12 percentage 
points. We conclude that although the sample covariance matrix works quite good for 
the most time it is not appropriate for measuring the linear dependence structure of 
financial data. This is due to a few but extreme fluctuations on financial markets. 



5.2 Principal Components Analysis 

Now, consider a (i-dimensional vector R = (R 1 , ...,Rd) of long-term (e.g. yearly) 
i.i.d. log-returns. Due to the central limit theorem each vector component of R is 
approximately normal distributed provided the covariance matrix of the short-term 
(e.g. daily) log-returns exists and is finite. Since the sum of i.i.d. elliptical random 
vectors is always elliptically distributed, too (see, e.g., Hult and Lindskog, 2002) one 
may take for granted that the vector components of R are jointly normally distributed, 
approximately. But this is not true if the number of dimensions d is large relative to 
the sample size n. 

For instance, consider a <i-dimensional random vector X which is multivariate t- 
distributed with v > 2 degrees of freedom, location vector /i = 0, and dispersion 
matrix £ = [y — 2)/u ■ I d . Due to the multivariate central limit theorem one could 
believe that 

1 n 

y-=-r- ~ Ar d ( 0) i d ), 
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where X 1: . . . , X n are independent copies of X. But indeed Y T Y ~ x\ n °lds only if 
q := n/d is large rather than n being large (cf. Frahm, 2004, Section 6.2). Thus the 
quantity q can be interpreted as 'effective sample size'. 

In the following it is assumed that R is elliptically distributed with location vector jx 
and dispersion matrix S. Let E = OVO T be a spectral decomposition of S. Then 

R = fi + oVVY, 
where Y spherically distributed with E = I d . 

We assume that the elements of V, i.e. the eigenvalues of E are given in a descending 
order and that the first k eigenvalues are large whereas the residual ones are small. The 
elements of Y are called 'principal components' of R. Since O is orthonormal the 
distribution of \fT>Y remains up to a rotation in M. d . The direction of each principal 
component is given by the corresponding column of O. 

Hence the first k eigenvalues correspond to the variances (up to a scaling constant) 
of the 'driving risk factors' contained in the first part of Y, i.e. (Y±, . . . , Yk). For the 
purpose of dimension reduction k shall not be too large. Because the d — k residual risk 
factors contained in (Y k+1 , . . . , Y d ) are supposed to have (relatively) small variances 
they can be interpreted as the components of the idiosyncratic risks of each firm, i.e. 

d 

£j • ^ ^ ^/Aj OijYj, i 1, . . . , d, 

j=k+l 

where Xj := Vjj. 

Thus we obtain the following principal components model for long-term log-returns, 

Ri = fii + f3nYi + . . . + [3 ik Y k + Ei, i = 1, . . . , d, 

where the driving risk factors Y±, Y k are uncorrected. Further, each noise term £j 
(i = 1, d) is uncorrected to Y±, Y k , too. But note that e\, . . . , e d are correlated, 
generally. The 'Betas' are given by = Oij for i — 1, . . . , d and j — 1, . . . , k. 

The purpose of principal components analysis is to reduce the complexity caused by 
the number of dimensions. This can be done successfully only if there is indeed a 
number of principal components accountable for the most part of the distribution. Ad- 
ditionally, the covariance matrix estimator which is used for extracting the principal 
components should be robust against outliers. 

For example, let the daily log-returns be multivariate t-distributed with v degrees of 
freedom and suppose that d = 500 and n = 1000. Note that due to the central limit 
theorem the normality assumption concerning the long-term log-returns makes sense 
whenever v > 2. The black lines in Figure 9 show the true proportion of the total 
variation for a set of 500 eigenvalues. We see that the largest 20% of the eigenvalues 
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accounts for 80% of the overall variance. This is known in economics as '80/20 rule' 
or 'Pareto's principle'. The estimated eigenvalue proportions obtained by the sample 
covariance matrix are represented by the red lines whereas the corresponding estimates 
based on the spectral estimator are given by the green lines. Each line is an average 
over 100 concentration curves drawn from samples of the corresponding multivariate 
t-distribution. 

If the data have a small tail index as given by the lower right of Figure 9 then the 
sample covariance matrix tends to underestimate the number of driving risk factors, 
essentially. This is similar to the phenomenon observed in Figure 6 where the number 
of large eigenvalues is overestimated. In contrast, the concentration curves obtained by 
the spectral estimator are robust against heavy tails. This holds even if the long-term 
log-returns are not asymptotically normal distributed. 




principal component principal component 

Fig. 9: True proportion of the total variation (black line) and proportions obtained by 
the sample covariance matrix (red lines) and by the spectral estimator (green lines). 
The samples are drawn from a multivariate t-distribution with v = oo (i.e. the multi- 
variate normal distribution, upper left), v = 10 (upper right), v = 5 (lower left), and 
v = 2 (lower right). 

In the simulated example of Figure 9 it is assumed that the small eigenvalues are equal. 
This is equivalent to the assumption that the residual risk factors are spherically dis- 
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tributed, i.e. that they contain no more information about the linear dependence struc- 
ture of R. But even if the true eigenvalues are equal the corresponding estimates 
will not share this property because of estimation errors. Yet it is important to know 
whether the residual risk factors have structural information or the differences between 
the eigenvalue estimates are only caused by random noise. This is not an easy task, es- 
pecially if the data are not normally distributed and the number of dimensions is large 
which is the issue of the next section. 



5.3 Signal-Noise Separation 

In the previous section it was mentioned that the central limit theorem fails in the 
context of high-dimensional data, i.e. if n/d is small. Hence, now we leave the field 
of classical multivariate analysis and get to the domain of RMT. 

Let E = OVO T G M. dxd be a spectral decomposition where V shall be a diagonal 
matrix containing a 'bulk' of small and equal eigenvalues and some large (but not 
necessarily equal) eigenvalues. For the sake of simplicity suppose 



V = 



cl k 
bl d _ k 



c> b > 0, 



where d — k is large. Hence E has two different characteristic manifolds. The 'major' 
one is determined by the first k column vectors of O (the 'signal part' of E) whereas 
the 'minor' one is given by the d — k residual column vectors of O (the 'noise part' 
of E). We are interested in separating signal from noise that is to say estimating k, 
properly. 

For instance, assume that n = 1000, d = 500, and that a sample consists of normally 
distributed random vectors with covariance matrix E, where b = 1, c = 5, and k = 
100. By using the sample covariance matrix and normalizing the eigenvalues one 
obtains exemplarily the histogram of eigenvalues given on the left hand of Figure 10. 
As might be expected the Marcenko-Pastur law is not valid due to the two different 
regimes of eigenvalues. In contrast, when focusing on the smallest 400 eigenvalues, 
i.e. on the noise part of E the Marcenko-Pastur law becomes valid as we see on the 
right hand of Figure 10. 
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eigenvalue eigenvalue 



Fig. 10: Histogram of all d = 500 eigenvalues (left hand) and of the noise part (right 
hand) consisting of the d — k = 400 smallest eigenvalues. The Marcenko-Pastur law 
is represented by the green lines. 

Thus separating signal from noise means sorting out the largest eigenvalues succes- 
sively until the residual eigenspectrum is consistent with the Marcenko-Pastur law. 
This is given, e.g., when there are no more eigenvalues exceeding the Marcenko-Pastur 
upper bound A max . In our case-study this is given for 397 eigenvalues (see the figure 
below), i.e. k = 103. 



0.9 




eigenvalue 

Fig. 11: Histogram of the remaining 397 eigenvalues after signal-noise separation. 

As it was shown in Section |4| this approach is promising only if the data are not regu- 
larly varying. Hence for financial data not the sample covariance matrix but the spec- 
tral estimator is proposed for a proper signal-noise separation. 

6 Conclusions 

Due to the stylized facts of empirical finance the Gaussian distribution hypothesis is 
not appropriate for the modeling of financial data. For that reason the authors rely 
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on the broad class of generalized elliptical distributions. This class allows for tail 
dependence and radial asymmetry. Although the sample covariance matrix works quite 
good with financial data for the most time it is not appropriate for measuring their 
linear dependence structure. This is due to a few but extreme fluctuations on financial 
markets. 

It is shown that there exists a completely robust ML-estimator (the 'spectral estima- 
tor') for the dispersion matrix of generalized elliptical distributions. This estimator 
corresponds to Tyler's M-estimator for elliptical distributions. Further, it is shown that 
the Marcenko-Pastur law fails if the sample covariance matrix is considered as random 
matrix in the context of elliptically or even generalized elliptically distributed data. 
This is due to the fact that stochastical independence implies linear independence but 
conversely uncorrelated random variables are not necessarily independent. In contrast, 
the Marcenko-Pastur law remains valid if the data are uncorrelated and the spectral 
estimator is considered as random matrix. 

The robustness property of the spectral estimator can be demonstrated for several finan- 
cial applications like, e.g., portfolio risk minimization, principal components analysis, 
and signal-noise separation. If the data are heavy tailed the principal components 
analysis tends to underestimate the number of driving risk factors if the sample covari- 
ance matrix is used for extracting the eigenspectrum. This means that the contribution 
of the largest eigenvalues to the total variation of the data is overestimated, systemati- 
cally. Consequently, in the context of signal-noise separation the largest eigenvalues 
are overestimated by the sample covariance matrix. This can be fixed simply by using 
the spectral estimator, instead. 
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